Characterizing Farmers and Farming system in Kilombero Valley Floodplain,Tanzania

1 Introduction

This notebook contains the code and result of analysis conducted to characterize farmers and farming systems in Kilombero valley flood plain. The notebook is structured in to four main section. The first section provides the details of the data collection methods and instrument used for this study. The And the second section will report the main socio-economic characteristics of surveyed farmers. The third section presents the result of the typology analysis from this survey. The fourth section present a second typology study based on the 2007 Agriculture sample survey of Tanzanian Government for validation and stability of the clusters emerged from our own data.

2 Data Source

The main data source is based on a household survey done in Kilombero floodplain in Tanzania between November and December 2015. As part of the project GLOBE -“Reconciling future food production and environmental sustainability in East African wetlands”. The surveys were carried out within 21 villages in two Districts (Ulanga and Kilombero) of the Kilombero Valley. In total 304 farm households were interviewed, giving their opinions upon a wide range of topics designed to discover the farming system in terms of resource availability and use, livelihood source.[For this specific survey a farm household is defined as individuals who live together, share meals and pool some or all of their income, and who cultivate land or keep livestock.]

The selection of households to be interviewed was based on a multi stage sampling strategy. In the first stage 12 wards were selected purposely based on the availability of floodplain farming. In the second stage 21 villages were selected randomly within the wards. In the final stage households were selected randomly from the list provided by each villages leader. The number of interviewees per village ranges from 5 in smaller villages to 15 in the biggest. A GIS coverage incorporating the land use map form GLC30 and the administrative boundary and census data from Tanzania statistics office was use to estimate the boundaries and total population size in the study area. The primary data was collected using a standard questionnaire which solicited information on aspects of rural livelihoods such as demographic details of a household; land use, land ownership and acreage, labor use, physical quantities of crop outputs as well as household patterns concerning their use as food or income source, ownership of various assets, responses to shocks, the embeddedness of households in social networks and institutions, future prospects and plans. The questionnaire was administered on one to one interview basis administered by 5 well trained (University graduates from SOKONE University of agriculture) enumerators who understand the farming context and the local language. The questioner was originally written in English and translated to Swahili during the interview by field assistants. A pre-test survey was also conducted in order to assess the understanding of the field assistants to administor the questionnaire and also to see how farmers understand the questions asked. And some questions and potential answers were modified to the understanding of the villagers.

3 Socio-economic characterstics of Households

3.1 Household Demographics

Demographic variables examined include age, sex, and marital status of the household head, family size and composition, and the level of education of households. Households surveyed were predominately headed by male with only 16 percent headed by female. Household heads were, on average, 46 years old and 77 percent of them married. The women who heads the households are mostly (92 percent) are either widowed, divorced or separated. Household heads education level were low with 7 percent of them lacking any formal education and an additional 83 percent having only completed primary school. The average household size for the entire sample was 5 (SD=2.18, n=304) with a minimum of 2 members and a maximum of 11 members. Forty-four percent of respondents have a family size of less than 4 members, which can be considered as a small family. And 41 percent are medium sized with 5-8 numbers of members. 12 percent of households in the sample are extended families, with more than 8 members

3.2 Livelihood and diversification

3.2.1 On farm livelihood source

Most of the households in the surveyed villages obtain their livelihood from agriculture. Crop production mainly rice and maize are the most important crops both for home consumption and income generation. Small numbers of households, specifically recently migrated pastorals, also integrate crops production with livestock rearing.

Given the agricultural small-holder producers in the study area are semi-subsistence farm households, part of the total product is retained within the household for home consumption. The remainder is sold on the market. For example, Almost 80 percent of rice producing farmers reported selling on average 58 percent of their rice harvests to the market to cover the costs of inputs and basic household needs. However, in most of the villages that are far from the nearest big market where the milling services are located, farmers usually sell their rice harvest and buy back again the milled rice from small traders at a price almost double to their selling price. On average households engage in crop pro- duction in the wetlands get gross income of 640 Thousands TZs per year per hectare. Of these 40 percent account for home consumption.

3.2.2 Off farm livelihood sources

Although income from farming is the dominant livelihood strategy for the majority of the farmers, 26 percent of the households have reported that they received some form of off-farm income during the year. The most common sources for off farm income in the area include remittance, rental of land, brick selling and small business shops. 14 percent of the survey households have received income from business primarily from small retail stores and transportation services [Bajaji and bodaboda]. Seven percent of the households have engaged in bricks production receiving on average gross revenue of 64, 763 TZs per year (often by farmers residing close to Ifekara). Contrary to the expectations, majority of farm households surveyed not engaged in sale of fish for cash generation in the study area. However, this doesn’t mean that there is no fishing activity in the valley. The valley actually supports a number of households by providing income source from fishing. The fishing activity is usually done by a marginalize group of fisheries who are solely based their livelihood on fishing.

Unequal access to resources and off farm opportunities among farmers has led to rural wealth differentiation in KVFP. In order to analyse income inequality within the valley, we calculated a gini coefficient and visually represent using lorenz curve [plotting cumulative income vs. cumulative population]. The Gini coefficient measures the extent to which the distribution of wealth within a group deviates from a perfectly equal distribution, with values from 0 to 1 (World Bank, 2011). The gini index (with 1000 bootstraped samples) in KVFP ranges between 55 and 68 percent which is higher than the national index of 37 percent.

3.2.3 Food Security

3.2.4 Access to credit

33 percent of the surveyed households obtained credits from different institutional and non institutional sources. The National Agriculture Census (2009) have even yielded smaller findings corroborating that most of the farmers in Kilombero and Ulanaga district have limited access to credit with only 2.4 percent of households 2.5 percent had access to credit The sources of credit for households include people credit funds (Savings and Credit Cooperatives (SACCOs)), commercial banks, and village lenders. People credit funds were the main source of formal credit (52 percent) due to the convenience and flexibility of payment terms despite the high interest rates they charge. The largest proportion (63 percent) of the credit is channeled to accessing inputs. During the field visit it was observed that farmers also engage in contractual farming with small traders. In this arrangement, traders will provide the cash require for different expenditures during the planting season and a farmer agrees to sell his potential rice harvest based on the price they currently agreed on.

Source of Credit

Purpose of Credit

3.3 Resource Endowment

3.3.1 Land

For most of sub Saharan African countries Land is the most important asset and re ects economic situation, power, prestige and security of any farmer or farm household. The amount of land to which a household has access and the terms on which it utilizes that land are factors that in uence, if not determine, its decisions about the strategies adopted in utilizing land resources to earn a livelihood. The average farm size in the study area was 2.6 hectares (sd= 2.8) with a maximum of 21 hectares. As the figure below shows almost 55 percent of the households own less than 2 hectares of land. Farmers typically own multiple plots with 62 percent of them owning two or more plots. Usually one plot with the largest size and in the seasonally ooded area will be used for rice and/or maize production and the smaller plots are often where the homesteads are located and households plant some vegetables for home consumption. The figure above also show, farmers are engaged in multi season farming or land rental market which results their actual planted area to be higher than land owned. The figure below also indicates the relationship between farm size and household size. The fractional polynomial regression graph shows that larger families are likely to have larger farm sizes. This might be the result of one, large families require more land to produce enough food to the household and also given the large size of the member its easier to manage large farmers.

Like most Sub-Saharan African countries, the livelihood of most of the farmers in Kilombero valley is intimately tied to the land. And the security of land ownership is a critical factor for well functioning of the economic, social or environmental make up of the area. Looking at the land ownership, in general land is a property of the state in Tanzania. However, there are usually customary rules that govern access and ownership of the land. 80 percent of the farmers surveyed reported that they own the land without any deeds. And the remaining reported either its owned by family, or rented or owned with deeds. However, communal farm land is not common in the area. The demand for farmland in the wetland has been increasing in the resent years. With an increase in rainfall volatility, frequent occurrence of extreme weather events and immense livelihood potential of wetlands have attracted farmers to acquire land in the valley from different parts of the country. The survey result shows that majority of the farmers acquired their wetland farm in the past 10 to 15 years. The following density plot of farmers starting time of wetland use shows that household has been farming the wetland for over 40 years, however majority of the farmers acquired their farm in the past two three decades. There has been a surge in the number of farmers who started using the wetland for cropping activity in late 1980 and beginning of 2000.

There are eight different modes farmers access and acquire land in the study area. Overall, one third of all the respondents acquired their land through inheritance from their parents , while 22 percent and 14 percent of farm households acquire their land through purchasing and occupying (which through clearing bush land or forest) re- spectively. Some farmers also acquired the plot by ceding either from the district government office or the village authority. Give different economic, institutional and biophysical constraint farmers face in the study area, most of the households didn’t expand their farm size in the past five years. Only eight percent of the surveyed house- holds have expanded their land and fifteen percent of the households have planned to expand their land.

Tenure

Land Acquisition

3.3.2 Labour

Having sufficient labor is one of the determinate factors for livelihood of households in the valley. Labor is provided either by household members or hired from the local labor pool. The survey result shows that hiring and exchange of labor is common in the area. 94 percent of surveyed households have hired laborers to help with different stages of cultivation, majority being hired during land preparation and cultivation stages during the 2015 cropping season. The suppliers of labor are either from the local labor pool or migrants who comes from different parts of Tanzania during the farming season. Although almost all farmers hire labor for cropping activity, the prime proportion of (measured Man days per year) is provided by family labor. On average 63 percent of the total man-day is provided by family labor and the remaining 37 percent is from hired labor. However, there is large variation between farmers some with entirely depending on hired labor or entirely with family labor.

3.3.3 Capital

Most farmers perform their bulk of farming activity using simple hand tools (like hoe, axe, digging fork etc). However, the use of tractor and draft animal during the initial land preparation stage is common in the valley. As most of the farmers cannot afford to individually own a tractor, they rely on the service of other tractor owners who will come to the valley during the land preparation period from all over the county. Fifty-seven percent of surveyed households hired a tractor for land preparation during the last cropping season. Hiring oxen is also common in the valley. Of the surveyed households, twenty eight percent hired oxen for the land preparation. Farmers prefer oxen hiring primarily for two reasons; one the demand for tractors are higher than the supply during the peak seasons and two most of the farms are located in the alluvial plain where there is no road for the tractor to make it to the farm. Usually migrant pastorals and agro-pastoral are the providers of the oxens with the price almost equal to the price for tractors. Nine percent of the households did not hire any farm machinery or oxen used traditional and manually operated tools.

All_Crops Land Preparation

Rice Land Preparation

Maize Land Preparation

3.3.4 Livestock

Although the two districts have experience surge in the number of cattle due to increase in pastorals and agro pastorals, most of the sedentary farmers are not engaged in extensive livestock farming. While sixty-three percent of the surveyed households have reported they own at least one livestock animal, the large majority of households keep small ruminants. Chickens are the most kept animals, with over 94 percent of livestock- keeping households raising at least one chicken or duck. Indigenous goats and sheep (24 percent), indigenous cattle (18 percent) and pigs (1 percent)

The plot below shows, livestock-keeping households in Kilombero district and Ulanga district own an average of 0.88 and 4.8 Tropical Livestock Unit (TLU). While in Ulanga district, livestock farmers keep a relatively large herd, some households owning more that 100 cattle, Poultry and small ruminants are the most common species in Kilombero. Overall, livestock play a minor economic role in comparison to crop production in the valley.

3.4 Crop Production

3.4.1 Crop Choice and Land use

Paddy rice is the dominant crop cultivated in the area. Given the flooding of the area during the rainy season this is not as surprising. As we can see from figure 14, on average farmers allocate 80 percent of their land for rice production, 13 percent to maize. And some farmers also produce vegetables, cassava and other permanent crops and fruits. One of the mystifying characteristics of crop choices in the valley is the remarkable uniformity and similarity of farm units which produce the same crop for the same reason year after year. Temporary mono- crop production predominates, with paddy rice and maize being the most important enterprises. The farmers base their decision primarily on the land suitably, productivity and prices. Majority of the farmers claim that their farm plot is either not suitable for other crop cultivation or has the better productivity.

3.4.2 Inpute Use

3.4.3 Crop Production Challange

Although wetlands are fertile and provide consistent moisture for agricultural produc- tion, farmers face a number of challenges in wetland farming. From farmers point of view diseases are the main constraints of wetland utilization in the valley. Malaria and other water borne diseases are common in the Kilombero valley. Farmers also reported agronomic constraints (pests, weeds and excessive floods) as the main challenges for their crop production activity in the valley.

3.5 Market Participation

Farmers market different proportions of their crop for cash. Rice is usually prioritized both for its local consumption and income generating potential. 80 percent of surveyed households have sold the rice compared to only 28 percent of households reporting selling maize. The survey result shows on average 60 percent of the rice and maize cultivated is sold for cash and the remaining 40 percent retained for home consumption. Farmer commercialization index, which is composite index of farmers total crop sell to total crop cultivation, is 46 percent in the study area. The marketing channel is characterized by large number of small traders operating between the farmer and the rice mills or maize market located in Ifakara. The local traders buy small quantities directly from farmers and transport to mills where it is milled and the rice sold to inter-regional traders or local retailers or directly to consumers.

3.6 Social Network and Institution

To help them make the most of their farming decision, farmers need access to a range of information that can help them decide on production, technology, weather, marketing etc saving them time and money. Farmers use different sources to access information on crop production, market and government policy. For information related to production and extension service, relatives, friends or neighbors were the primary source identified by 42 percent of respondents in the valley and 69.9 percent and 23 percent and 21 percent identified extension officer and radio as source of information respectively. For information related to market, farmers identified their social tie and radio as main sources. Radio is the main source of information for new or change in government policy.

3.7 Access to Infrastructure

The survey results indicate that among the evaluated services, tarmac road was a service located very far from most of the households dwellings than any other service. It was located at an average distance of 28.43 kilometers from the agricultural households dwellings. Other services and their respective average distances in kilometers from the dwellings were telephone center (13.43), health center (4), and river stream (2.6).

3.8 Shock and Responses

Across the valley farmers faced frequent risks and shocks to their agricultural produc- tion and their livelihood, crop pest and disease damage, including disease outbreaks, volatility in market prices and occurrence of extreme weather events. Farming in the Kilombero valley is often subject to environmental disturbances such as extreme weather events: drought, water logging, floods, untimely or uneven distribution of rain- fall, incidence of pest and diseases. Based on self reported challenges, in the past 5 years, Crop pests and disease have affected 74 percent of all farmers surveyed. Drought and flooding have also affected 56 percent and 40 percent of the households, respec- tively. The other risks to farmer livelihoods were related to high volatility in market prices for agricultural products, large increases in the prices of agricultural inputs (31 percent), low prices for their products (58 percent) and increase in the food price (38 percent). The survey result also shows that farmers are limited in terms of their coping strategy towards these shocks. Almost 50 percent of the households reported that they didn’t do anything to cope with the shock and 18 percent of the households reported they worked more to recover and 11 percent of them used their own saving. And 10 percent of the households got help from relatives and friends.

Shock Types

Coping Mechanisims

original.data <- list.data$Perception2.dta
original.data <- remove_missing(original.data)

myfactor  <- function(x) {
  
  factor(x, labels = c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree") ) 
  
  
}

original.data2 <- lapply(original.data, myfactor)
original.data2 <- as.data.frame(original.data2)

names(original.data2) <- c(
  "Compared to the past, <br> the use of the wetlands for crop production <br> in this area has increased",
  "Compared to the past, the use of wetlands <br> for material collection has increased ",
  "Compared to the past, the level of fertility <br> in the wetland has declined ",
  "People in this community would support <br> efforts to conserve the wetland(s)",
  "Compared to 10 years ago, the amount of water <br> in the wetlands  in this area has declined ",
  "Compared to the last 10 years,<br> fishing activities in the wetland have increased",
  "People in this area feel that they own the wetland ",
  "People in this area care about <br>public natural resources ",
  "I have witnessed some form of conflict<br> over wetland resources in the past",
  "Government officials are effective<br> in protecting the wetland",
  "It is safe to leave my home unattended<br> because no one can steal anything",
  "People in this area feel generally secure<br> when dealing with outsiders ",
  "If I drop my wallet or purse somewhere within<br> this village, I am likely to get it back",
  "People in this area have <br>strong traditional attachment to wetlands",
  "Most of my neighbours  have lost some assets <br> or livestock to thieves within the last five years",
  "Farm sizes in my village have <br>declined compared to the past",
  "The population in my village has increased<br> currently compared to 10 years ago",
  "In the past 5 years,<br> I have noticed an increase in <br>the amount of income I generate from the Wetland",
  "My family’s economic situation has <br>improved compared with 5 years ago"
)

original.data3 <- original.data2
df_summary <- likert(original.data3)
df_summary <- as.data.frame(df_summary$results)


df_summary_1 <- df_summary[c(1:6),]
df_summary_2 <- df_summary[c(7:12),]
df_summary_3 <- df_summary[c(13:19),]

# write.csv(df_summary_1, here::here("Data_csv","Perception1.csv"))
# write.csv(df_summary_2, here::here("Data_csv","Perception2.csv"))
# write.csv(df_summary_3, here::here("Data_csv","Perception3.csv"))

3.9 Perception of Wetlands

Sustainable use of a common pool resource is related to the positive and negative opinions and perception that farmers have regarding its benefits, underlying pressure, social structure of the users to mention few. In order to understand opinions and perception of farmers in KVFP towards the wetland ecosystem, we documented a response of the farmers for a set of 19 questions. The questions are based on a typical likert scale with five point scales [strongly agree, Agree, neutral, disagree, and strongly disagree] used to tapping into the cognitive and effective components of their attitudes. The following charts shows the percentage of farmers expressing their agreement or disagreement on a symmetric agree-disagree scale for a series of statements while responding to a particular question. And the second chart presents the distribution of the responses for each question. For example, 81 percent of the farmers agree that people in the community would support efforts to conserve wetlands in KVFP. 80 percent of the farmers also agree the use of wetlands for crop production has increased over the years. Although almost all surveyed farmers also feel they own the wetland, only 25 percent of the farmers agree that they have strong traditional attachment to wetlands. 98 percent of the farmers also have the opinion that the population in their village has increased over time and 58 percent of the farmers have the opinion that farm sizes have declined compared to the past.

Perception One

options(stringsAsFactors = FALSE)
data <- read.csv(here::here("Data_csv","Perception1.csv"))
data <- data[,c(2:7)]
data <- round_df(data)
names(data) <- c("y", "x1","x2","x3","x4", "x5")
y= data$y
x1 <- data$x1
x2 <- data$x2
x3 <- data$x3
x4 <- data$x4
x5 <- data$x5

top_labels <- c('Strongly<br>agree', 'Agree', 'Neutral', 'Disagree', 'Strongly<br>disagree')
p <- plot_ly(data, x = ~x1, y = ~y, type = 'bar', orientation = 'h', name="Strongly<br>agree",
             marker = list(color = 'rgba(38, 24, 74, 0.8)',
                           line = list(color = 'rgb(248, 248, 249)', width = 1))) %>%
  add_trace(x = ~x2, name="Agree", marker = list(color = 'rgba(71, 58, 131, 0.8)')) %>%
  add_trace(x = ~x3,name="Neutral", marker = list(color = 'rgba(122, 120, 168, 0.8)')) %>%
  add_trace(x = ~x4, name="Disagree", marker = list(color = 'rgba(164, 163, 204, 0.85)')) %>%
  add_trace(x = ~x5, name="Strongly Disagree", marker = list(color = 'rgba(190, 192, 213, 1)')) %>%
  layout(xaxis = list(title = "",
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = FALSE,
                      zeroline = FALSE,
                      domain = c(0.15, 1)),
         yaxis = list(title = "",
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = FALSE,
                      zeroline = FALSE),
         barmode = 'stack',
         paper_bgcolor = 'rgb(248, 248, 255)', plot_bgcolor = 'rgb(248, 248, 255)',
         margin = list(l = 150, r = 5, t = 140, b = 60),
         showlegend = FALSE) %>%
  # labeling the y-axis
  add_annotations(xref = 'paper', yref = 'y', x = 0.14, y = y,
                  xanchor = 'right',
                  text = y,
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(67, 67, 67)'),
                  showarrow = FALSE, align = 'right') %>%
  # labeling the percentages of each bar (x_axis)
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 / 2, y = y,
                  text = paste(data[,"x1"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 / 2, y = y,
                  text = paste(data[,"x2"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 / 2, y = y,
                  text = paste(data[,"x3"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 + x4 / 2, y = y,
                  text = paste(data[,"x4"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 + x4 + x5 / 2, y = y,
                  text = paste(data[,"x5"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  # labeling the first Likert scale (on the top)
  add_annotations(xref = 'x', yref = 'paper',
                  x = c(21 / 2, 21 + 30 / 2, 21 + 30 + 21 / 2, 21 + 30 + 21 + 16 / 2,
                        21 + 30 + 21 + 16 + 12 / 2),
                  y = 1.15,
                  text = top_labels,
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(67, 67, 67)'),
                  showarrow = FALSE)
p

Perception Two

options(stringsAsFactors = FALSE)
data <- read.csv(here::here("Data_csv","Perception2.csv"))
data <- data[,c(2:7)]
data <- round_df(data)
names(data) <- c("y", "x1","x2","x3","x4", "x5")
y= data$y
x1 <- data$x1
x2 <- data$x2
x3 <- data$x3
x4 <- data$x4
x5 <- data$x5

top_labels <- c('Strongly<br>agree', 'Agree', 'Neutral', 'Disagree', 'Strongly<br>disagree')
p2 <- plot_ly(data, x = ~x1, y = ~y, type = 'bar', orientation = 'h', name="Strongly<br>agree",
              marker = list(color = 'rgba(38, 24, 74, 0.8)',
                            line = list(color = 'rgb(248, 248, 249)', width = 1))) %>%
  add_trace(x = ~x2, name="Agree", marker = list(color = 'rgba(71, 58, 131, 0.8)')) %>%
  add_trace(x = ~x3,name="Neutral", marker = list(color = 'rgba(122, 120, 168, 0.8)')) %>%
  add_trace(x = ~x4, name="Disagree", marker = list(color = 'rgba(164, 163, 204, 0.85)')) %>%
  add_trace(x = ~x5, name="Strongly Disagree", marker = list(color = 'rgba(190, 192, 213, 1)')) %>%
  layout(xaxis = list(title = "",
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = FALSE,
                      zeroline = FALSE,
                      domain = c(0.15, 1)),
         yaxis = list(title = "",
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = FALSE,
                      zeroline = FALSE),
         barmode = 'stack',
         paper_bgcolor = 'rgb(248, 248, 255)', plot_bgcolor = 'rgb(248, 248, 255)',
         margin = list(l = 150, r = 5, t = 140, b = 60),
         showlegend = FALSE) %>%
  # labeling the y-axis
  add_annotations(xref = 'paper', yref = 'y', x = 0.14, y = y,
                  xanchor = 'right',
                  text = y,
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(67, 67, 67)'),
                  showarrow = FALSE, align = 'right') %>%
  # labeling the percentages of each bar (x_axis)
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 / 2, y = y,
                  text = paste(data[,"x1"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 / 2, y = y,
                  text = paste(data[,"x2"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 / 2, y = y,
                  text = paste(data[,"x3"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 + x4 / 2, y = y,
                  text = paste(data[,"x4"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 + x4 + x5 / 2, y = y,
                  text = paste(data[,"x5"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  # labeling the first Likert scale (on the top)
  add_annotations(xref = 'x', yref = 'paper',
                  x = c(21 / 2, 21 + 30 / 2, 21 + 30 + 21 / 2, 21 + 30 + 21 + 16 / 2,
                        21 + 30 + 21 + 16 + 12 / 2),
                  y = 1.15,
                  text = top_labels,
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(67, 67, 67)'),
                  showarrow = FALSE)
p2

Perception Three

options(stringsAsFactors = FALSE)
data <- read.csv(here::here("Data_csv","Perception3.csv"))
data <- data[,c(2:7)]
data <- round_df(data)
names(data) <- c("y", "x1","x2","x3","x4", "x5")
y= data$y
x1 <- data$x1
x2 <- data$x2
x3 <- data$x3
x4 <- data$x4
x5 <- data$x5

top_labels <- c('Strongly<br>agree', 'Agree', 'Neutral', 'Disagree', 'Strongly<br>disagree')
p3 <- plot_ly(data, x = ~x1, y = ~y, type = 'bar', orientation = 'h', name="Strongly<br>agree",
              marker = list(color = 'rgba(38, 24, 74, 0.8)',
                            line = list(color = 'rgb(248, 248, 249)', width = 1))) %>%
  add_trace(x = ~x2, name="Agree", marker = list(color = 'rgba(71, 58, 131, 0.8)')) %>%
  add_trace(x = ~x3,name="Neutral", marker = list(color = 'rgba(122, 120, 168, 0.8)')) %>%
  add_trace(x = ~x4, name="Disagree", marker = list(color = 'rgba(164, 163, 204, 0.85)')) %>%
  add_trace(x = ~x5, name="Strongly Disagree", marker = list(color = 'rgba(190, 192, 213, 1)')) %>%
  layout(xaxis = list(title = "",
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = FALSE,
                      zeroline = FALSE,
                      domain = c(0.15, 1)),
         yaxis = list(title = "",
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = FALSE,
                      zeroline = FALSE),
         barmode = 'stack',
         paper_bgcolor = 'rgb(248, 248, 255)', plot_bgcolor = 'rgb(248, 248, 255)',
         margin = list(l = 150, r = 5, t = 140, b = 60),
         showlegend = FALSE) %>%
  
  
  # labeling the y-axis
  add_annotations(xref = 'paper', yref = 'y', x = 0.14, y = y,
                  xanchor = 'right',
                  text = y,
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(67, 67, 67)'),
                  showarrow = FALSE, align = 'right') %>%
  # labeling the percentages of each bar (x_axis)
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 / 2, y = y,
                  text = paste(data[,"x1"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 / 2, y = y,
                  text = paste(data[,"x2"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 / 2, y = y,
                  text = paste(data[,"x3"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 + x4 / 2, y = y,
                  text = paste(data[,"x4"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  add_annotations(xref = 'x', yref = 'y',
                  x = x1 + x2 + x3 + x4 + x5 / 2, y = y,
                  text = paste(data[,"x5"], '%'),
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(248, 248, 255)'),
                  showarrow = FALSE) %>%
  # labeling the first Likert scale (on the top)
  add_annotations(xref = 'x', yref = 'paper',
                  x = c(21 / 2, 21 + 30 / 2, 21 + 30 + 21 / 2, 21 + 30 + 21 + 16 / 2,
                        21 + 30 + 21 + 16 + 12 / 2),
                  y = 1.15,
                  text = top_labels,
                  font = list(family = 'Muli', size = 10,
                              color = 'rgb(67, 67, 67)'),
                  showarrow = FALSE)
p3

4 Characterizing farmers in KVFP through Typology

To capture farmer heterogeneity and elicit the diversity of livelihoods and strategies, we created an attribute-based typology using a Statistical Non parametric Multivariate Analysis technique. Farmer typology research has become popular as a way of segmenting farmers into groups to assist in developing targeted programs. Using these approaches, survey data from farmers is collected and then clustered statistically from the data upwards to develop groupings. Emerging styles are grounded in the data rather than attempting to classify cases (farmers) into predetermined classes as with expert based farmer typology. Once farm types are identified, farmers can be discriminated by the characteristics of their households and of their farm management. In particular, types of farmers can be distinguished on the basis of their land use, income, their involvement in both on and off farm activities, market participation and access to infrastructure . Different types of farmers are expected to pursue different land use trajectories with important effects on the various ecosystem services that flow from the KVFP. These differences can result heterogeneity on uptake of alternative farming practices, future technologies and their adoption, and can be used to target interventions more effectively. The methodology in this study involves two steps:- Principal Component Analysis for reducing the dimensionality of the variables under consideration and Hierarchical Clustering for grouping farmers in to different segments.

Principal component analysis is a multivariate statistical technique that linearly transform an original set of variables in to smaller set of uncorrelated variables [called principal components] that account for decreasing proportions of the total variance of the original variables (Dunteman 1989). This phase can be considered as denouncing step, which can lead to a more stable clustering. According to Husson, Lê, and Pagès (2017) , PCA can be used as a method to separate signal and noise in the original data set with the first components extracting the essential information, whilst the last components representing the noise in the data. As such, applying the clustering on the PCA without the noise in the data, will lead a stable cluster of the data.

Hierarchical Clustering Clustering is a Multivariate statistical procedure that starts with a data set containing information about a sample of cases and attempts to reorganize these cases into relatively homogeneous groups based on the calculation of their Euclidean distance from the cluster centers. As a technique, it is used for exploring data sets to assess whether or not they can be summarized meaningfully in terms of a relatively small number of groups or clusters of cases or individuals which resemble each other and which are different in some respects from individuals in other clusters (Everitt et al. 2011). There are different approaches of clustering, Partitioning, Hierarchy algorithms, Density-based clustering, grid based clustering and model based clustering. In this study, we used hierarchical clustering. Hierarchical clustering is an alternative approach to partitioning clustering methods like k-means clustering for identifying groups in the data set. It does not require pre-specifying the number of clusters to be generated.

A total of 14 variables were selected based on farmers livelihood and land use choices. The selected variables include age of the household head, household size, access to market, commercialization index, land allocation to crops (Rice, maize and Vegetables), livestock ownership, access to water, access to off-farm on farm income etc. In order to reduce the effect of difference in unit of measurement the variables were scaled.

In the following sections, we will provide the procedure (and R code) that is followed in grouping the farmers. The clustering was implemented using R statistical software.

Main references for statistical method and software implementation are listed at the endof the notebook.

4.1 Principal Component Analysis (PCA)

As stated in the previous section, Principal component analysis is used to extract the important information from a multivariate data table and to express this information as a set of new variables called principal components.

Construction of PCA involves a number of steps. The first step involves, Center and scale with the scale() function (Everitt et al. 2011). Scaling the data values includes 1. Center: subtract from each value the mean of the corresponding vector 2. Scale: divide centered vector by their root mean square (rms): \[ x_{rms} = \sqrt[]{\frac{1}{n-1}\sum_{i=1}^{n}{x_{i}{^2}}} \] - Result: Mean = 0 and STDEV = 1

Once the data is scaled we run the PCA function from the factorminer package (Kassambara and Mundt 2017), and extract the eigenvalues, the loading and contribution of each variable to components. Table 6 shows the eigen values of the top 6 components. The eigenvalues represent the amount of variation retained by each PC. The first PC corresponds to the direction with the maximum amount of variation in the data set. The five components totally represent 68 percent of the variation in the original data set.

The correlation between a variable and a PC is called loading. The variables can be plotted as points in the component space using their loadings as coordinates. The squared loadings for variables are called cos2 ( = cor * cor = coord * coord). The cos2 values are used to estimate the quality of the representation. The closer a variable is to the circle of correlations,the better its representation on the factor map . Variables that are closed to the center of the plot are less important for the first components.

The following table and figures show the contributions of variables in accounting for the variability in a given principal component. For example Farm size in Ha account 34 percent of the first component followed by tropical livestock unit(TLU) accounting 27 percent of variability. On the other hand , due to orthogonality of the components different set of variables explain the variability of the second component with land share of rice and maze accounting 70 percent of the variability.

Dimension All

Dimension 1

Dimension 2

Dimension 3

Dimension 4

Dimension 5

4.2 Hierarchical clustering on Principal componenets (HCPC)

Hierarchical clustering algorithms – output a dendrogram, which is a tree representation of the data whose leaves are the input patterns and whose non-leaf nodes represent a hierarchy of groupings (Husson et al ,2011). There are different types of HC with agglomerative and divisive being the widely used ones. Agglomerative HC work bottom up, with each individual in a separate cluster; clusters are then iteratively merged, according to some criterion. Divisive algorithms start from the whole data set in a single cluster and work top down by iteratively dividing each cluster into two components until all clusters are singletons. Here we used Agglomerative HC with Ward’s criterion. This criterion decompose the total inertia (total variance) in between and within-group variance such that the growth of within-inertia is minimum (in other words minimizing the reduction of the between-inertia) at each step of the algorithm.. The total inertia can be decomposed:

\[ \sum_{k=1}^K\sum_{q=1}^Q\sum_{i=1}^{I_{q}}{(x_{iqk}-{\overline{x}}_{k})}^{2} = \sum_{k=1}^K\sum_{q=1}^Q I_{q}{(x_{qk}-{\overline{x}}_{k})}^{2}+\sum_{k=1}^K\sum_{q=1}^Q\sum_{i=1}^{I_{q}}{(x_{iqk}-{\overline{x}}_{qk})}^{2} \]

with \(x_{iqk}\) the value of the variable k for the individual i of the cluster q, \({\overline{x}}_{qk}\) the mean of the variable k for cluster q, \({\overline{x}}_{k}\) the overall mean of variable k and Iq the number of individuals in cluster q.

The hierarchy is represented by a dendrogram which is indexed by the gain of within-inertia.

From the dendrogram and the factor maps above we observe that, the sampled farmers in KVFP can be grouped in to three different groups. Once the clusters are identified the next step is to characterize the clusters based on the important variables and assign a label using a heuristic approach. Table 8 shows the link between the clusters identified and the variables. The share of land allocated to rice, farm size , land allocated to maize TLU and Household size are the top five important variables in segmenting the farmers in to the three clusters.

The tables below also show which variables are significantly associated with each clusters. For example, share of rice , percent of hiredlabour , household commercialization index are significantly and positively associated with first cluster. The signs of the v. test shows if these variables are higher or lower relative to the over all mean of the respective variable. Given the importance of these variables , we labeled the first cluster as “Mono-crop rice producers”, with almost 90 percent of their land allocated to rice (compared to 78 percent for all farmers), relatively higher hired labor and market participation and relatively lower percapita income.

Share of land allocated to maize and vegetable and percent hired and share of rice are most significantly associated with cluster two. Hence, we labeled the second cluster of farmers as “Diversifier” with their land allocated to maize, vegetables and rice(significantly lower than the over all average).

Finally the third cluster is associated with farm size, TLU, House hold size and percapita income. Given the mix of farming and livestock keeping these cluster of farmers, we labeled it as “Agro-Pastorals”. The agro -pastorals own relatively higher farm size, TLU household size and percapita income relative to their pears in the valley. And lower market participation (crop) and lower labor man-days per year per hectare.

Based on the above labeling of the clusters, the following chart shows the proportion of each farm type in sampled farmers. The majority(65 percent) of the farmers are mono-crop rice producers , 28 percent of the farmers are diversifiers and the remaining 7 percent are agro-pastorals.

From the box plots below one can observe the main differences in terms of different farm attributes important to understand farm management and land use trajectories.

4.2.1 Box plot for the variables and farm type

5 Validation of Typology with the 2007 Agriculture Sample Survey

In order to check the validity and stability of the clusters identified above. we conducted the same clustering algorithm using the 2007 Agriculture sample survey of Tanzania. The data contains 810 observation across 54 villages in kilombero and Ulanga districts. The selection of the variables and algorithms are the same as the above analysis. However, the ASS data misses two important variables , Percapita income , amount of labour used in crop production.

5.1 PCA

Dimension All

Dimension 1

Dimension 2

Dimension 3

Dimension 4

Dimension 5

5.2 HCPC

5.2.1 Box plot for the variables and farm type

6 Session Info

R version 3.6.3 (2020-02-29)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=de_DE.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=de_DE.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=de_DE.UTF-8 and LC_IDENTIFICATION=C

attached base packages:

  • stats
  • graphics
  • grDevices
  • utils
  • datasets
  • methods
  • base

other attached packages:

  • rgdal(v.1.4-7)
  • sp(v.1.3-1)
  • here(v.0.1)
  • leaflet(v.2.0.2)
  • viridis(v.0.5.1)
  • viridisLite(v.0.3.0)
  • knitr(v.1.25)
  • MASS(v.7.3-51.5)
  • ineq(v.0.2-13)
  • likert(v.1.3.5)
  • xtable(v.1.8-4)
  • DT(v.0.9)
  • reshape2(v.1.4.3)
  • splines2(v.0.2.8)
  • ggpubr(v.0.2.3)
  • magrittr(v.1.5)
  • kableExtra(v.1.1.0)
  • summarytools(v.0.9.4)
  • corrplot(v.0.84)
  • clustertend(v.1.4)
  • psych(v.1.8.12)
  • stargazer(v.5.2.2)
  • ade4(v.1.7-13)
  • cluster(v.2.1.0)
  • FactoMineR(v.1.42)
  • factoextra(v.1.0.5)
  • rhandsontable(v.0.3.7)
  • haven(v.2.1.1)
  • Hmisc(v.4.3-0)
  • Formula(v.1.2-3)
  • survival(v.3.1-8)
  • lattice(v.0.20-40)
  • foreign(v.0.8-75)
  • tidyr(v.1.0.0)
  • ggthemes(v.4.2.0)
  • scales(v.1.0.0)
  • RColorBrewer(v.1.1-2)
  • plotly(v.4.9.0)
  • dplyr(v.0.8.3)
  • ggplot2(v.3.2.1)

loaded via a namespace (and not attached):

  • colorspace(v.1.4-1)
  • ggsignif(v.0.6.0)
  • pryr(v.0.1.4)
  • ellipsis(v.0.3.0)
  • rprojroot(v.1.3-2)
  • htmlTable(v.1.13.2)
  • base64enc(v.0.1-3)
  • rstudioapi(v.0.10)
  • ggrepel(v.0.8.1)
  • lubridate(v.1.7.4)
  • xml2(v.1.2.2)
  • codetools(v.0.2-16)
  • splines(v.3.6.3)
  • leaps(v.3.0)
  • mnormt(v.1.5-5)
  • zeallot(v.0.1.0)
  • jsonlite(v.1.6)
  • shiny(v.1.4.0)
  • readr(v.1.3.1)
  • compiler(v.3.6.3)
  • httr(v.1.4.1)
  • backports(v.1.1.5)
  • assertthat(v.0.2.1)
  • Matrix(v.1.2-18)
  • fastmap(v.1.0.1)
  • lazyeval(v.0.2.2)
  • later(v.1.0.0)
  • acepack(v.1.4.1)
  • htmltools(v.0.4.0)
  • tools(v.3.6.3)
  • gtable(v.0.3.0)
  • glue(v.1.3.1)
  • Rcpp(v.1.0.2)
  • vctrs(v.0.2.0)
  • nlme(v.3.1-144)
  • crosstalk(v.1.0.0)
  • xfun(v.0.10)
  • stringr(v.1.4.0)
  • rvest(v.0.3.4)
  • mime(v.0.7)
  • lifecycle(v.0.1.0)
  • dendextend(v.1.12.0)
  • hms(v.0.5.2)
  • promises(v.1.1.0)
  • parallel(v.3.6.3)
  • yaml(v.2.2.0)
  • gridExtra(v.2.3)
  • pander(v.0.6.3)
  • rpart(v.4.1-15)
  • latticeExtra(v.0.6-28)
  • stringi(v.1.4.3)
  • checkmate(v.1.9.4)
  • rlang(v.0.4.1)
  • pkgconfig(v.2.0.3)
  • matrixStats(v.0.55.0)
  • bitops(v.1.0-6)
  • evaluate(v.0.14)
  • purrr(v.0.3.3)
  • labeling(v.0.3)
  • rapportools(v.1.0)
  • htmlwidgets(v.1.5.1)
  • tidyselect(v.0.2.5)
  • ggsci(v.2.9)
  • plyr(v.1.8.4)
  • bookdown(v.0.18)
  • R6(v.2.4.0)
  • magick(v.2.2)
  • pillar(v.1.4.2)
  • withr(v.2.1.2)
  • scatterplot3d(v.0.3-41)
  • RCurl(v.1.95-4.12)
  • nnet(v.7.3-13)
  • tibble(v.2.1.3)
  • crayon(v.1.3.4)
  • rmarkdown(v.2.1)
  • grid(v.3.6.3)
  • data.table(v.1.12.6)
  • rmdformats(v.0.3.6)
  • forcats(v.0.4.0)
  • digest(v.0.6.22)
  • flashClust(v.1.01-2)
  • webshot(v.0.5.1)
  • httpuv(v.1.5.2)
  • munsell(v.0.5.0)
  • tcltk(v.3.6.3)

References

Arnold, Jeffrey B. 2018. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.

Bache, Stefan Milton, and Hadley Wickham. 2014. Magrittr: A Forward-Pipe Operator for R. https://CRAN.R-project.org/package=magrittr.

Bryer, Jason, and Kimberly Speerschneider. 2016. Likert: Analysis and Visualization Likert Items. https://CRAN.R-project.org/package=likert.

Chessel, D., A.B. Dufour, and J. Thioulouse. 2004. “The Ade4 Package-I- One-Table Methods.” R News 4: 5–10.

Comtois, Dominic. 2018. Summarytools: Tools to Quickly and Neatly Summarize Data. https://CRAN.R-project.org/package=summarytools.

Dahl, David B. 2016. Xtable: Export Tables to Latex or Html. https://CRAN.R-project.org/package=xtable.

Dray, S., and A.B. Dufour. 2007. “The Ade4 Package: Implementing the Duality Diagram for Ecologists.” Journal of Statistical Software 22 (4): 1–20.

Dray, S., A.B. Dufour, and D. Chessel. 2007. “The Ade4 Package-II: Two-Table and K-Table Methods.” R News 7 (2): 47–52.

Dunteman, George H. 1989. Principal Components Analysis. 69. Sage.

Everitt, Brian S, Sabine Landau, Morven Leese, and Daniel Stahl. 2011. “Hierarchical Clustering.” Cluster Analysis, 5th Edition. Wiley Online Library, 71–110.

Harrell Jr, Frank E, with contributions from Charles Dupont, and many others. 2018. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/package=Hmisc.

Hlavac, Marek. 2018. Stargazer: Well-Formatted Regression and Summary Statistics Tables. Bratislava, Slovakia: Central European Labour Studies Institute (CELSI). https://CRAN.R-project.org/package=stargazer.

Husson, François, Sébastien Lê, and Jérôme Pagès. 2017. Exploratory Multivariate Analysis by Example Using R. Chapman; Hall/CRC.

Kassambara, Alboukadel. 2017. Ggpubr: ’Ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.

Kassambara, Alboukadel, and Fabian Mundt. 2017. Factoextra: Extract and Visualize the Results of Multivariate Data Analyses. https://CRAN.R-project.org/package=factoextra.

Lê, Sébastien, Julie Josse, and François Husson. 2008. “FactoMineR: A Package for Multivariate Analysis.” Journal of Statistical Software 25 (1): 1–18. https://doi.org/10.18637/jss.v025.i01.

Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik. 2018. Cluster: Cluster Analysis Basics and Extensions.

Neuwirth, Erich. 2014. RColorBrewer: ColorBrewer Palettes. https://CRAN.R-project.org/package=RColorBrewer.

Owen, Jonathan. 2018. Rhandsontable: Interface to the ’Handsontable.js’ Library. https://CRAN.R-project.org/package=rhandsontable.

R Core Team. 2017. Foreign: Read Data Stored by ’Minitab’, ’S’, ’Sas’, ’Spss’, ’Stata’, ’Systat’, ’Weka’, ’dBase’, ... https://CRAN.R-project.org/package=foreign.

Revelle, William. 2018. Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych.

Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. New York: Springer. http://lmdvr.r-forge.r-project.org.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2017. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Terry M. Therneau, and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer.

Therneau, Terry M. 2015. A Package for Survival Analysis in S. https://CRAN.R-project.org/package=survival.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS4.

Wang, Wenjie, and Jun Yan. 2017. splines2: Regression Spline Functions and Classes. https://CRAN.R-project.org/package=splines2.

Wei, Taiyun, and Viliam Simko. 2017. R Package "Corrplot": Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.

Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.

———. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

———. 2017. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.

Wickham, Hadley, Romain Fran?ois, Lionel Henry, and Kirill M?ller. 2018. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, and Lionel Henry. 2018. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.

Wickham, Hadley, and Evan Miller. 2018. Haven: Import and Export ’Spss’, ’Stata’ and ’Sas’ Files. https://CRAN.R-project.org/package=haven.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.

———. 2018a. DT: A Wrapper of the Javascript Library ’Datatables’. https://CRAN.R-project.org/package=DT.

———. 2018b. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.

YiLan, Luo, and Zeng RuTong. 2015. Clustertend: Check the Clustering Tendency. https://CRAN.R-project.org/package=clustertend.

Zeileis, Achim. 2014. Ineq: Measuring Inequality, Concentration, and Poverty. https://CRAN.R-project.org/package=ineq.

Zeileis, Achim, and Yves Croissant. 2010. “Extended Model Formulas in R: Multiple Parts and Multiple Responses.” Journal of Statistical Software 34 (1): 1–13. https://doi.org/10.18637/jss.v034.i01.

Zhu, Hao. n.d. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax.

Bisrat H. Gebrekidan*

Dr. Sebastian Rash

Prof. Dr. Thomas Heckelei
ILR, University of Bonn

6/2/2017